Binary neural networks are the extreme case of network quantization, which has long been thought of as a potential edge machine learning solution. However, the significant accuracy gap to the full-precision counterparts restricts their creative potential for mobile applications. In this work, we revisit the potential of binary neural networks and focus on a compelling but unanswered problem: how can a binary neural network achieve the crucial accuracy level (e.g., 80%) on ILSVRC-2012 ImageNet? We achieve this goal by enhancing the optimization process from three complementary perspectives: (1) We design a novel binary architecture BNext based on a comprehensive study of binary architectures and their optimization process. (2) We propose a novel knowledge-distillation technique to alleviate the counter-intuitive overfitting problem observed when attempting to train extremely accurate binary models. (3) We analyze the data augmentation pipeline for binary networks and modernize it with up-to-date techniques from full-precision models. The evaluation results on ImageNet show that BNext, for the first time, pushes the binary model accuracy boundary to 80.57% and significantly outperforms all the existing binary networks. Code and trained models are available at: https://github.com/hpi-xnor/BNext.git.
translated by 谷歌翻译
新颖的类发现(NCD)的目的是在一个未标记的数据集中推断出新的类别,该数据集利用了包含不相交但相关类别的标签集的先验知识。现有的研究主要侧重于利用方法学层面的标签集,而不太强调标记集合本身的分析。因此,在本文中,我们从标记的集合中重新考虑了小说类发现,并关注两个核心问题:(i)给定特定的未标记集,什么样的标签集可以最好地支持新颖的类发现? (ii)NCD的基本前提是标记的集合必须与未标记的集合有关,但是我们如何衡量这种关系?对于(i),我们提出并证实了这样的假设,即NCD可以从具有与未标记集的标签相似性的标签集中受益更多。具体而言,我们通过利用其层次结构结构来建立一个广泛而大规模的基准,在Imagenet上标记/未标记的数据集之间具有不同程度的语义相似性。作为鲜明的对比,现有的NCD基准是根据具有不同类别和图像的标签集开发的,并且完全忽略了语义关系。对于(ii),我们引入了一个数学定义,用于量化标记和未标记集之间的语义相似性。此外,我们使用此指标来确认我们提出的基准测试的有效性,并证明它与NCD性能高度相关。此外,在没有定量分析的情况下,以前的工作通常认为标签信息总是有益的。但是,违反直觉,我们的实验结果表明,使用标签可能会导致低相似性设置中的次级优势。
translated by 谷歌翻译
场景文本检测的具有挑战性的领域需要复杂的数据注释,这是耗时和昂贵的。弱监管等技术可以减少所需的数据量。本文提出了一种薄弱的现场文本检测监控方法,这是利用加强学习(RL)。RL代理收到的奖励由神经网络估算,而不是从地面真理标签推断出来。首先,我们增强了具有多种培训优化的文本检测的现有监督RL方法,允许我们将性能差距缩放到基于回归的算法。然后,我们将拟议的系统在现实世界数据的漏洞和半监督培训中使用。我们的结果表明,在弱监督环境中培训是可行的。但是,我们发现在半监督设置中使用我们的模型,例如,将标记的合成数据与未经发布的实际数据相结合,产生最佳结果。
translated by 谷歌翻译
对比语言 - 图像预培训(剪辑)在广泛的图像中与跨模仿监督学习的卓越成功 - 在线收集的文本对。到目前为止,夹子的有效性主要是在一般结构域多数制问题中进行研究。这项工作评估了剪辑的有效性,用于医学视觉问题的任务(MedVQA)。为此,我们向PubMedClip提供PubMedClip,基于PubMed文章的医疗领域的微调版本。我们的实验是在两个MedVQA基准数据集中进行,并调查两种MedVQA方法,MEVF(增强的视觉功能)和QCR(通过条件推理的问题回答)。对于这些中的每一个,我们使用PubMedClip,原始剪辑和最先进的MAML(模型 - 不可知的Meta-Learning)网络仅评估视觉表示学习的优点,仅在视觉数据上训练。我们为我们的Medvqa管道和预训练PubMedclip开源代码。与MAML的Visual Encoder相比,剪辑和PubMedClip实现了改进。 PubMedclip以最高精度的最佳效果达到最佳结果,高达3%。个别示例说明了与先前广泛使用的MAML网络相比的PubMedclip的强度。 PubMedclip语言监督的视觉表现出学习导致MedVQA的显着改进。我们的实验揭示了在以前的工作中尚未传授的两个MedVQA基准数据集中的分布差异,并在PubMedClip中导致不同的后端视觉编码,在这些数据集上表现出不同的行为。此外,我们证明了VQA一般与医学领域的基本性能差异。
translated by 谷歌翻译
Deep-learning of artificial neural networks (ANNs) is creating highly functional tools that are, unfortunately, as hard to interpret as their natural counterparts. While it is possible to identify functional modules in natural brains using technologies such as fMRI, we do not have at our disposal similarly robust methods for artificial neural networks. Ideally, understanding which parts of an artificial neural network perform what function might help us to address a number of vexing problems in ANN research, such as catastrophic forgetting and overfitting. Furthermore, revealing a network's modularity could improve our trust in them by making these black boxes more transparent. Here we introduce a new information-theoretic concept that proves useful in understanding and analyzing a network's functional modularity: the relay information $I_R$. The relay information measures how much information groups of neurons that participate in a particular function (modules) relay from inputs to outputs. Combined with a greedy search algorithm, relay information can be used to {\em identify} computational modules in neural networks. We also show that the functionality of modules correlates with the amount of relay information they carry.
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
Local patterns play an important role in statistical physics as well as in image processing. Two-dimensional ordinal patterns were studied by Ribeiro et al. who determined permutation entropy and complexity in order to classify paintings and images of liquid crystals. Here we find that the 2 by 2 patterns of neighboring pixels come in three types. The statistics of these types, expressed by two parameters, contains the relevant information to describe and distinguish textures. The parameters are most stable and informative for isotropic structures.
translated by 谷歌翻译
It is well known that conservative mechanical systems exhibit local oscillatory behaviours due to their elastic and gravitational potentials, which completely characterise these periodic motions together with the inertial properties of the system. The classification of these periodic behaviours and their geometric characterisation are in an on-going secular debate, which recently led to the so-called eigenmanifold theory. The eigenmanifold characterises nonlinear oscillations as a generalisation of linear eigenspaces. With the motivation of performing periodic tasks efficiently, we use tools coming from this theory to construct an optimization problem aimed at inducing desired closed-loop oscillations through a state feedback law. We solve the constructed optimization problem via gradient-descent methods involving neural networks. Extensive simulations show the validity of the approach.
translated by 谷歌翻译
Artificial intelligence(AI) systems based on deep neural networks (DNNs) and machine learning (ML) algorithms are increasingly used to solve critical problems in bioinformatics, biomedical informatics, and precision medicine. However, complex DNN or ML models that are unavoidably opaque and perceived as black-box methods, may not be able to explain why and how they make certain decisions. Such black-box models are difficult to comprehend not only for targeted users and decision-makers but also for AI developers. Besides, in sensitive areas like healthcare, explainability and accountability are not only desirable properties of AI but also legal requirements -- especially when AI may have significant impacts on human lives. Explainable artificial intelligence (XAI) is an emerging field that aims to mitigate the opaqueness of black-box models and make it possible to interpret how AI systems make their decisions with transparency. An interpretable ML model can explain how it makes predictions and which factors affect the model's outcomes. The majority of state-of-the-art interpretable ML methods have been developed in a domain-agnostic way and originate from computer vision, automated reasoning, or even statistics. Many of these methods cannot be directly applied to bioinformatics problems, without prior customization, extension, and domain adoption. In this paper, we discuss the importance of explainability with a focus on bioinformatics. We analyse and comprehensively overview of model-specific and model-agnostic interpretable ML methods and tools. Via several case studies covering bioimaging, cancer genomics, and biomedical text mining, we show how bioinformatics research could benefit from XAI methods and how they could help improve decision fairness.
translated by 谷歌翻译
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the lack of research interest into this problem to the absence of suitable benchmarks. In this work, we introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence. Our benchmark allows researchers to tackle key remaining challenges in GOT, aiming to increase robustness and reduce computation through joint tracking of multiple objects simultaneously. Furthermore, we propose a Transformer-based GOT tracker TaMOS capable of joint processing of multiple objects through shared computation. TaMOs achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark. Finally, TaMOs achieves highly competitive results on single-object GOT datasets, setting a new state-of-the-art on TrackingNet with a success rate AUC of 84.4%. Our benchmark, code, and trained models will be made publicly available.
translated by 谷歌翻译